Unobtrusive methods for low-cost manual evaluation of machine translation
نویسنده
چکیده
Machine translation (MT) evaluation metrics based on n-gram co-occurrence statistics are financially cheap to execute and their value in comparative research is well documented. However, their value as a standalone measure of MT output quality is questionable. In contrast, manual methods of MT evaluation are financially expensive. This paper will present early research being carried out within the CNGL (Centre for Next Generation Localisation) on a low-cost means of acquiring MT evaluation data in an operationalised manner in a commercial post-edited MT (PEMT) context. An approach to MT evaluation will be presented which exposes translators to output from a set of candidate MT systems and reports back on which system requires the least post-editing. It is hoped that this approach, combined with instrumentation mechanisms for tracking the performance and behaviour of individual post-editors, will give insight into which MT system, if any, out of a set of candidate systems is most suitable for a particular large or ongoing technical translation project. For the longer term we propose that post-editing data gathered in a commercial context may be valuable to MT researchers.
منابع مشابه
Survey of Machine Translation Evaluation
The evaluation of machine translation (MT) systems is an important and active research area. Many methods have been proposed to determine and optimize the output quality of MT systems. Because of the complexity of natural languages, it is not easy to find optimal evaluating methods. The early methods are based on human judgements. They are reliable but expensive, i.e. time-consuming and non-reu...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملAn Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation
Automatic evaluation of machine translation (MT) quality is essential in developing high quality MT systems. Despite previous criticisms, BLEU remains the most popular machine translation metric. Previous studies on the schism between BLEU and manual evaluation highlighted the poor correlation between MT systems with low BLEU scores and high manual evaluation scores. Alternatively, the RIBES me...
متن کاملiAppraise: A Manual Machine Translation Evaluation Environment Supporting Eye-tracking
We present iAppraise: an open-source framework that enables the use of eye-tracking for MT evaluation. It connects Appraise, an opensource toolkit for MT evaluation, to a low-cost eye-tracking device, to make its usage accessible to a broader audience. It also provides a set of tools for extracting and exploiting gaze data, which facilitate eye-tracking analysis. In this paper, we describe diff...
متن کاملDynamic Terminology Integration Methods in Statistical Machine Translation
In this paper the author presents methods for dynamic terminology integration in statistical machine translation systems using a source text pre-processing workflow. The workflow consists of exchangeable components for term identification, inflected form generation for terms, and term translation candidate ranking. Automatic evaluation for three language pairs shows a translation quality improv...
متن کامل